A Hybrid Sampling Scheme for Triangle Counting
نویسندگان
چکیده
We study the problem of estimating the number of triangles in a graph stream. No streaming algorithm can get sublinear space on all graphs, so methods in this area bound the space in terms of parameters of the input graph such as the maximum number of triangles sharing a single edge. We give a sampling algorithm that is additionally parameterized by the maximum number of triangles sharing a single vertex. Our bound matches the best known turnstile results in all graphs, and gets better performance on simple graphs like G(n, p) or a set of independent triangles. We complement the upper bound with a lower bound showing that no sampling algorithm can do better on those graphs by more than a log factor. In particular, any insertion stream algorithm must use √ T space when all the triangles share a common vertex, and any sampling algorithm must take T 1/3 samples when all the triangles are independent. We add another lower bound, also matching our algorithm’s performance, which applies to all graph classes. This lower bound covers “triangle-dependent” sampling algorithms, a subclass that includes our algorithm and all previous sampling algorithms for the problem. Finally, we show how to generalize our algorithm to count arbitrary subgraphs of constant size.
منابع مشابه
Efficient Algorithms for Approximate Triangle Counting
Counting the number of triangles in a graph has many important applications in network analysis. Several frequently computed metrics like the clustering coefficient and the transitivity ratio need to count the number of triangles in the network. Furthermore, triangles are one of the most important graph classes considered in network mining. In this paper, we present a new randomized algorithm f...
متن کاملApproximate Triangle Counting
Triangle counting is an important problem in graph mining. Clustering coefficients of vertices and the transitivity ratio of the graph are two metrics often used in complex network analysis. Furthermore, triangles have been used successfully in several real-world applications. However, exact triangle counting is an expensive computation. In this paper we present the analysis of a practical samp...
متن کاملOn Sampling from Massive Graph Streams
We propose Graph Priority Sampling (GPS), a new paradigm for order-based reservoir sampling from massive graph streams. GPS provides a general way to weight edge sampling according to auxiliary and/or size variables so as to accomplish various estimation goals of graph properties. In the context of subgraph counting, we show how edge sampling weights can be chosen so as to minimize the estimati...
متن کاملA geometric diagram and hybrid scheme for triangle subdivision
We introduce a geometrical diagram to study the improvement in shape of triangles generated by iterative application of triangle subdivision. The four Triangles Longest Edge (4TLE) subdivision pattern and a new hybrid 4T Longest-Edge/Self-Similar (hybrid 4TLE-SS) scheme are investigated in this way. The map diagram provides a convenient way to visualize the evolution and migration of element sh...
متن کاملTriadic Measures on Graphs: The Power of Wedge Sampling
Graphs and networks are used to model interactions in a variety of contexts, and there is a growing need to be able to quickly assess the qualities of a graph in order to understand its underlying structure. Some of the most useful metrics are triangle based and give a measure of the connectedness of “friends of friends.” Counting the number of triangles in a graph has, therefore, received cons...
متن کامل